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Abstract 

In the online checkpointing problem, the task is to continuously main- 
tain a set of k checkpoints that allow to rewind an ongoing computation 
faster than by a full restart. The only operation allowed is to replace an 
old checkpoint by the current state. Our aim are checkpoint placement 
strategies that minimize rewinding cost, i.e., such that at all times T when 
requested to rewind to some time t < T the number of computation steps 
that need to be redone to get to t from a checkpoint before t is as small as 
possible. In particular, we want that the closest checkpoint earlier than 
t is not further away from t than qk times the ideal distance T/{k + 1), 
where qk is a small constant. 

Improving over earlier work showing 1 + 1/fc < qk < 2, we show 
that qk can be chosen asymptotically less than 2. We present algorithms 
with asymptotic discrepancy qk < 1.59 + o(l) valid for all k and qk < 
ln(4) + o(l) < 1.39 + o(l) valid for k being a power of two. Experiments 
indicate the uniform bound pk < 1.7 for all k. For small k, we show how 
to use a linear programming approach to compute good checkpointing 
algorithms. This gives discrepancies of less than 1.55 for all k < 60. 

We prove the first lower bound that is asymptotically more than one, 
namely qk > 1.30 — o(l). We also show that optimal algorithms (yielding 
the infimum discrepancy) exist for all k. 

1 Introduction 

Checkpointing means storing selected intermediate states of a long sequence of 
computations. This allows reverting the system to an arbitrary previous state 
much faster, since only the computations from the preceding checkpoint have 
to be redone. Checkpointing is one of the fundamental techniques in computer 
science. Classic results date back to the seventies [s], more recent topics are 
checkpointing in distributed j^, sensor network j?], or cloud [lO] architectures. 

*Karl Bringmann is a recipient of the Google Europe Fellowship in Randomized Algorithms, 
and this research is supported in part by this Google Fellowship. 
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Checkpointing usually involves a careful trade-ofF between the speed-up 
of reversions to previous states and the costs incurred by setting checkpoints 
(time, memory) . Much of the classic literature (see js] and the references therein) 
studies checkpointing with the focus of gaining fault tolerance against immediately 
detectable faults. Consequently, only reversions to the most recent checkpoint 
are needed. However, setting a checkpoint can be highly time consuming, 
because the whole system state has to be copied to secondary memory. In 
such scenarios, the central question is how often to set a checkpoint such that 
the expected time spent on setting checkpoints and redoing computations from 
the last checkpoint is minimized (under a stochastic failure model and further, 
possibly time-dependent assumptions on the cost of setting a checkpoint). 

In this work, we will regard a checkpointing problem of a different nature. 
If not fault-tolerance of the system is the aim of checkpointing, then often 
the checkpoints can be kept in main memory. Applications of this type arise 
in data compression [2] and numerics |6j|8]. In such scenarios, the cost of 
setting a checkpoint is small compared to the cost of the regular computation. 
Consequently, the memory used by the stored checkpoints is the bottleneck. 

The first to provide an abstract framework independent of a particular 
application in mind were Ahlroth, Pottonen and Schumacher fT . They do not 
make assumptions on which reversion to previous states will be requested, but 
simply investigate how checkpoints can be set in an online fashion such that at 
all times their distribution is balanced over the total computation history. 

They assume that the system is able to store up to k checkpoints (plus 
a free checkpoint at time 0). At any point in time, a previous checkpoint 
may be discarded and replaced by the current system state as new checkpoint. 
Costs incurred by such a change are ignored. However, as it turns out, good 
checkpointing algorithms do not set checkpoints very often. For all algorithms 
discussed in the remainder of this paper, each checkpoint is changed only 0(log T) 
times up to time T. 

The max-ratio discrepancy measure. Each set of checkpoints, together 
with the current state and the state at time 0, partitions the time from the 
process start to the current time T into A: -I- 1 disjoint intervals. Clearly, without 
further problem-specific information, an ideal set of checkpoints would lead to 
all these intervals having identical length. Of course, this is not possible at all 
points in time due to the restriction that new checkpoints can only be set on the 
current time. As discrepancy measure for a checkpointing algorithm, Ahlroth 
et al. mainly regard the maximum gap ratio, that is, the maximum ratio of the 
longest interval vs. the shortest interval (ignoring the last interval, which can be 
arbitrarily small), over all current times T. They show that there is a simple 
algorithm achieving a discrepancy of two: Start with all checkpoints placed 
evenly, e.g., at times 1, . . . , fc. At an even time T, remove one of the checkpoints 
at an odd time and place it at T. This will lead to all checkpoints being at the 
even times 2, 4, . . . , 2fc when T — 2k is reached. Since these checkpoints form a 
scaled copy of the initial ones, we can continue in this fashion forever. It is easy 
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to see that at all times, the intervals formed by neighboring checkpoints have at 
most two different lengths, the larger being twice the smaller in case that not all 
lengths are equal. This shows the discrepancy of two. 

It seems tempting to believe that one can do better, but, in fact, not 
much improvement is possible for general k as shown by the lower bound 
Qf 2i-i/r(fc+i)/2l ^ 2(1 - o(l)). For small values of k, namely k = 2,3,4, and 5, 
better upper bounds of approximately 1.414, 1.618, 1.755, and 1.755, respectively, 
were shown. 

The maximum distance discrepancy measure. In this work, we shall 
regard a different, and, as we find, more natural discrepancy measure. Recall 
that the actual cost of reverting to a particular state is basically the cost of 
redoing the computation from the preceding checkpoint to the desired point 
in time. Adopting a worst-case view on the time to revert to, our aim is to 
keep the length of the longest interval small (at all times). Note that with 
time progressing, the interval lengths necessarily grow. Hence a fair point of 
comparison is the length T/ (k + l) of a longest interval in the (at time T) optimal 
partition of the time frame into equal length intervals. For this reason, we say 
that a checkpointing algorithm (using k checkpoints) has maximum distance 
discrepancy (or simply discrepancy) q if it places the checkpoints in such a way 
that at all times T, the longest interval has length at most qT/{k + 1). We 
denote by q*{k) the infimum discrepancy among all checkpointing algorithms 
using k checkpoints. 

This maximum distance discrepancy measure was suggested in yj . There it 
was remarked that an upper bound of /3 for the gap-ratio discrepancy implies an 
upper bound of /3(1 + j:) for the maximum distance discrepancy. Furthermore, 
for all k an upper bound of 2 and a lower bound of 1 -f ^ is shown for q*{k). 
For fc = 2, 3, 4, and 5, stronger upper bounds of 1.785, 1.789, 1.624, and 1.565, 
respectively, were shown. 

Our results. In this work, we show that the optimal discrepancy q*{k) is 
asymptotically bounded away from both one and two by a constant. We present 
algorithms that achieve a discrepancy of 1.59+0{l/k) for all k (Theorem[2|, and a 
discrepancy of ln(4) + o(l) < 1.39+o(l) for k being any power of two (Theorem[3|. 
For small values of k, and this might be an interesting case in applications with 
memory-consuming states, we show superior bounds by suggesting a class of 
checkpointing algorithms and optimizing their parameters via a combination 
of exhaustive search and linear programming (Table [ij. Experiments suggest 
q*{k) < 1.7 for all k (Sect. [6|. We complement these constructive results by a 
lower bound for q*{k) of 2 - ln(2) - 0(l/fc) > 1.3 - 0{l/k) (Theorem [6]) . We 
round off this work with a natural, but seemingly nontrivial result: We show 
that for each k there is indeed a checkpointing algorithm having discrepancy 
q*{k) (Theorem |4]) . In other words, the infimum in the definition of q*{k) can 
be replaced by a minimum. 
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2 Notation and Preliminaries 



In the checkpointing problem with k checkpoints, we consider a long running 
computation during which we can choose to save the state at the current time T 
in a checkpoint, or delete a previously placed one. We assume that our storage 
can hold at most k checkpoints simultaneously, and that there are implicit 
checkpoints at time t = and the current time. We disregard any costs for 
placing or maintaining checkpoints. Consequently, we may assume that we only 
delete a previous checkpoint when a new one is placed. 

An algorithm for checkpoint placement can be described by two infinite 
sequences. First, the time points where new checkpoints are placed, i.e., a 
non-decreasing infinite sequence of reals ii < t2 < • ■ ■ such that limi_j.oo ti = oo, 
and second, a rule that describes which old checkpoints to delete when a new 
one is installed, that is, an injective function d : [k + I..00) — >■ N satisfying di < i 
for alH > A: + 1. 

The algorithm A described by {t, d) will start with ti, . . . ,tk as initial check- 
points and then for each i > fc + 1, at time ti remove the checkpoint at t^,. 
and set a new checkpoint at the current time ti . We call the act of removing 
a checkpoint and placing a new one a step of A. Note that there is little point 
in setting the first k checkpoints to zero, so to make the following discrepancy 
measure meaningful, we shall always require that tk > 0. 

We call the set of checkpoints that exist at time T active. The active 
checkpoints, together with the two implicit checkpoints at times and T, define 
a sequence of fc + 1 interval lengths Ct = (^Oj ■ ■ • ,^k)- The discrepancy q{A,T) 
of an algorithm A at time T > is a measure of how long the maximal interval 
is, normalized to be one if all intervals have the same length. It is calculated as 

q{A,T) (fc + l^T/T, 

where It = H'CtHoo denotes the length of the longest interval. We also use the 
term discrepancy when we refer to the scaled length of a single interval. 

The discrepancy Perf (^) of an algorithm A then is the supremum over the 
discrepancy over all times T, i.e., 

Perf(A) sup qiA,T). 

T>tk 

Hence the discrepancy of an algorithm would be 1, if it kept its checkpoints evenly 
distributed at all times. Denote the infimum discrepancy of a checkpointing 
algorithm using k checkpoints by 

q*{k) inf Perf(A), 

where A runs over all algorithms using k checkpoints. We will see in Sect. [7] that 
algorithms achieving this discrepancy actually exist. 

Note that we allow checkpointing algorithms to set checkpoints at continuous 
time points. One can convert any such algorithm to an algorithm with integral 
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checkpoints by rounding all checkpointing times U down. This does not increase 
the discrepancy since [ti\ — [ti-i\ < ti — U-i + 1, but with discrete time there 
are at most [U} — \ti-\\ — 1 steps to recompute in this interval. 

In the definition of the discrepancy, the suprcmum is never attained at some 
T with ti < T < ti^i for any i, as shown in the following lemma. 

Lemma 1. In the definition of the discrepancy it suffices to consider times 
T = ti for all i> k, i.e., we have 

Perf(A) = sup q{A,ti}. 

i>k 

Proof. Consider a time T with ti <T < ti+i for any i > k. We show that 

q{A,T) < max{q{A,ti),q{A,ti+i)}. 

Denote the active checkpoints at time T by xi, . . . , Xk- Note that Xk = ti, since 
ti was the last time we set a checkpoint. Consider the interval [xk,T]. Its 
discrepancy is exactly 

{k + 1) < (fc + 1) < q{A, ti^,). 

Any other interval at time T is of the form [xj-i, Xj\ for some 1 < j <k (where 
we set Xq := 0), whose discrepancy is 

(& + l f'~^'-' <{k + if-l^p^ < q{A,ti). 

1 ti 

Together, this proves the claim. □ 

To bound the discrepancy of an algorithm we need to bound the largest of 

the q{A,ti) over all i> k. For this purpose, it suffices to look at the two newly 
created intervals at time ti for each i, as made explicit by the following lemma. 

Lemma 2. Let i > k and let £i,i2 be the lengths of the two newly created 
intervals at time ti due to the removal and the insertion of a checkpoint. Then 

max{q{A, ti_i),q{A, t,)} = max{g(A, {k + l)h/ti, {k + l)£2/ti}. 

Proof If £i or £2 is the longest interval at time ti the claim holds. Any other 
interval existed already at time ti-i and had a larger discrepancy at this time, 

as we divide by the current time to compute the discrepancy. Thus, if any other 
interval is the longest at time ti, then we have q{A, ti-i) > q{A, ti) and the claim 
holds again. □ 

Often, it will be useful to use a different notation for the checkpoint that 

is removed in step i. Instead of the global index d, one can also use the index 
p : [k + I..00) — >• [1..A:] among the active checkpoints, i.e., 

Pi = di-\{j €[i-l]\dj <di}\. 
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We call an algorithm A — (t,p) cyclic, if the Pi are periodic with some period 
n, i.e., Pi = pi+n for all i, and after n steps A has transformed the intervals 
to a scaled version of themselves, that is, Ct,,^^^ ~ '^•'■Ct^. for some 7 > 1 and 
all j G N. We call 7 the scaling factor. For a cyclic algorithm A, it suffices to 
fix the pattern of removals P = (p^+i, . . . ,Pk+n) ^-nd the checkpoint positions 
^1, . . . , tfc, tk+i, . . . , tk+n- Since our discrepancy notion is invariant under scaling, 
we can assume without loss of generality that tk = I (and hence tk+n = 7) ■ 

Since cyclic algorithms transform the starting position to a scaled copy of 
itself, it is easy to see that their discrepancy is given by the maximum over the 
discrepancies during one period, i.e., for cyclic algorithms A with period n we 
have 

Perf(A) = max q{A,ti). 
This makes this class of algorithms easy to analyze. 

3 Introductory Example A Simple Bound for 

k = 3 

For the case of fc = 3 there is a very simple algorithm. Simple, with a discrepancy 
of 4/0^ 1.53, where — (-\/5 + l)/2 is the golden ratio. Because the algorithm 
is so simple, we use it to familiarize ourselves with the notation we introduced 
in Sect. |2] The algorithm is cyclic with a pattern of length one. We prove the 
following theorem. 

Theorem 1. For k — 3 there is a cyclic algorithm Simple with period length 
one and 

4 

Perf (Simple) = — . 

Proof. We fix the pattern to be P ~ (1), that is, algorithm Simple always 
removes the oldest checkpoint. For this simple pattern it is easy to calculate the 
discrepancy depending on the scaling factor 7. Since the intervals need to be 
a scaled copy of themselves after just one step and we can fix = 1, we know 
immediately that 

_ 1 _ 1 

tl ~ t2 ~ ^3 — 1, ti — 

7 7 

and hence the discrepancy is determined by 

, t2-t, t3-t2 \ Jl 7-1 7-l\ 

Since 7 > 1, the second term is always smaller than the third and can be ignored. 
As 1/7^ is decreasing and (7 — l)/7 is increasing, the maximum is minimal when 
they are equal. Simple calculation shows this to be the case at 7 = 0. 

Hence for k — 3 the algorithm with pattern (1) and checkpoint positions 
ti — l/(f>'^, t2 ~ 1/0, ts — 1, and ti = (j) has discrepancy 4/0^ « 1.53. □ 
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T = 1 

step — ■ • • ■ • 

step 1 — X • • • • • 

step 2 • X • • • • 

step 3 • • X • • • 

step 4 • ■ • X • • 

step 5 • • ■ • ^ • 

T = 2.46 

Figure 1: One period of the algorithm Linear from Sect.|4]for fc = 5. After one 
period all intervals are scaled by the same factor. 

The experiments in Sect. [6] indicate that for A: = 3 this is optimal among all 
cyclic algorithms with a period of length at most 6. 



4 A Simple Upper Bound for Large k 

In this section we present an algorithm. Linear, with a discrepancy of roughly 
1.59 for large fc. This improves upon the asymptotic bound of 2 from yj. 
Moreover, Linear is easily implemented for all fc. 

Like the algorithm Simple of the previous section, the algorithm Linear is 
cyclic. It has a simple pattern of length fc. The pattern is just (1, . . . , fc), that is, 
at the i-th step of a period Linear deletes the i-th active checkpoint. Overall, 
during one period LINEAR removes all checkpoints at times ti with odd index i, 
as shown in Fig. [T] 

This removal pattern is identical to the one of Powers-Of-Two algorithm 
from f\\. However, that algorithm starts with a uniform checkpoint distribution 
where removing any checkpoint doubles the maximum interval. This leads to 
an asymptotic discrepancy of two. In contrast, Linear places checkpoints on 
a polynomial. For i S [1, 2fc] we set ti = (i/k)", where a is a constant. In the 
analysis we optimize the choice of a and set a := 1.302. For this algorithm we 
show the following theorem. 

Theorem 2. Algorithm Linear has a discrepancy of at most 

Perf (Linear) < 1.586 + 0(fc"^). 

Experiments show that the discrepancy of algorithm Linear is close to the 
bound of 1.586 even for moderate sizes of fc. Comparisons using the optimization 
method from Sect. |6] indicate that for the pattern (l,...,fc — 1) of algorithm 
Linear, different checkpoint placements can yield only improvements of about 
4.5% for large fc. Experimental results are summarized in Fig.|4j 

Proof. As algorithm Linear is cyclic, we can again compute the discrepancy 
from the 2fc checkpoint positions and the pattern, 

Perf (Linear) = max (k + l)£t/ti, 

k<i<2k 
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where it^ is the length of the longest interval at time ti. By Lemma [2] it suffices 
to consider newly created intervals at times tk+i, . . . ,t2k- Note that at time 
ti we create the intervals (from insertion of a checkpoint at ti) and 

[^2(i-fc)-2i t2{i-k)] (from deletion of the checkpoint at i2(i-fe)-i)- The discrepancy 
of the new interval by insertion is, for k < i < 2k, 

ik + 1) ^ ik + if-i'-^r < ^ ,^ik + D" - 



ti ~ ' (fc + 1)" 

Using {x + — x'^ < c{x + 1)'^^^ for any a; > and c > 1, this simplifies to 



< (fc + 1)- 



(fc + 1)" 



for any constant a > 1. 

For the new interval from deleting the checkpoint at i2(i-fe)-i we get a 
discrepancy of 

^ ^) ^2(.^fc)-^2(»-fc)-2 ^ ^ ^) (2(»^fc) )" - (2(z - fc) - 2)" 



< (fc + 1)2 



a{i — ky 



where we used again (a; + 1)"^ — x'^ < c{x + ly. An easy computation shows that 
{i — k)°'~^ /i" is maximized at i — ak over k < i < 2k. Hence, we can upper 
bound this discrepancy by 



< (l + -]2-^i^— ^ =2"(l--) +0(fc-i). 



a" 

We optimize the latter term numerically and obtain for a — 1.302 an upper 
bound of 

1.586 + 0(fc-i). 

Note that this bound is larger than the bound a = 1.302 from the new intervals 
from insertion. Hence, overall we get the desired upper bound. □ 



5 An Improved Upper Bound for Large k 

In this section we present the algorithm Binary that yields a discrepancy of 
roughly ln(4) « 1.39 for large k. Compared to the algorithm Linear from the 
last section. Binary has a considerably better discrepancy at the price of a more 
involved analysis, and it only works for k being a power of two. 

Theorem 3. For fc > 8 being any power of 2, the algorithm BINARY has 
discrepancy 

Perf(BiNARY)<ln(4) + ^+0(i). 
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Here and in the remainder of this paper, let 'Ig' denote the binary and 'In' 
the natural logarithm. Note that the term 0{l/k) quickly tends to 0, whereas 
the 8(l/lg(A;/4)) term is small due to the constant 0.05. Hence, this discrepancy 
is close to ln(4) already for moderate k. Also note that ln(4) is by less than 
0.1 larger than our lower bound from Sect. [8] leaving room for less than a 
6% improvement over the upper bound for algorithm Binary for large k. We 
verified experimentally that algorithm Binary yields very good bounds already 
for relatively small k. The results are summarized in Fig. [5] 

5.1 The Algorithm Binary 

The initial checkpoints ti, . . . ,tk satisfy the equation 

U = ati/2 (1) 
for each even 1 < i < k and some a ~ a{k) > 2. Precisely, we set 

lg(%/2/ In 4) 2 I P-P^'' 



a := 2 + iBCfe/") « 2 



lE(fc/4) , 



However, the usefulness of this expression becomes clear only in the analysis of 
the algorithm. 

During one period we delete all odd checkpoints ^1,^3, . . . ,tk-i and insert 
the new checkpoints 

tk+i atk/2+i, (2) 
for 1 < J < fc/2. Then after one period we end up with the checkpoints 

(t2,ti, . . . ,tk-2 ,tk , tk+1 , tk+2 'tk+k/2) 
= a- (^1, ^27 • ■ • I ^fc/2+l7 ^fc/2+27 • ■ • 7 *fc/2+fe/2) = 7 ^2 7 • • ■ ; ^fc) ; 

which proves cyclicity. Note that (jlJ and ([2| allow us to compute all ti from the 
values tk/2+1, ■ ■ ■ ,tk, however, we still have some freedom to choose the latter 
values. Without loss of generality we can set tk ■= 1, then ti^/2 = a~^. In 
between these two values, we interpolate Igtj linearly, i.e., we set for i G (fc/2, k] 



a 



2i/k-2 



(3) 



completing the definition of the ti. Note that this equation also works for i = k 
and i = k/2. 

There is one more freedom we have with this algorithm, namely in which 
order we delete all odd checkpoints during one period, i.e., we need to fix the 
pattern of removals. In iteration l<i<fc/2we insert the checkpoint ti^^i and 
remove the checkpoint t^t^ij^j.-^, defined as follows. For m G N = N>i let 2'^^™) be 
the largest power of 2 that divides m. We define S: N — )■ N, S{m) :— m/2'^(™'. 
Note that S{m) is an odd integer. Using this definition, we set 

d(fc + ^):=5(^+^), (4) 
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T = 1 



step ■ • 1- 

step 1 > X «««««« • — • h 

step 2 ^ f ^ x «> ♦ — > » » > « 9 ^ — • — • 

step 3 » » » » — • — » X • • • • • — • — • — • 1- 

step 4 » « X • — • • • — • — ■ ■ • ■ • — • — • — ■ — • i- 

step 5 • — * * * — • — ■ X • • • — • — • — • — • • i- 

step 6 4 — • X ♦ — • — • • • ♦ — • — • — • — • • • 1- 

step 7 • — • • — • — • » X • — • — • — • — • • • • i- 

step 8 ^if> — • — • •— • — • • •— • — • — • — • • • • •- 

T = 2.012 

Figure 2: One period of the algorithm Binary for k = 16. Note that, recursively, 
checkpoints are removed twice as often from the right half of the initial setting 
(at steps i where i mod 2 = 1) as from the second quarter. 

finishing the definition of the algorithm Binary. If we write this down as a 
pattern, then we have Pi = 1 + k/iO}'^^'^^'^) for 1 < i < fc/2 and Pk/2 = 1- 
For intuition as to the behavior of this pattern, see the example in Fig. |2] 
The following lemma implies that the deletion behavior of Binary is indeed 
well-defined, meaning that during one period we delete all odd checkpoints 
ti, ^3, . . . , tk-i (and no point is deleted twice). 

Lemma 3. The Junction S induces a bijection between {k/2 < i < k} and 
{l<i<fc|i is odd}. 

Proof. Let A := {k/2 < i < k} and B := {1 < i < k \ i is odd}. Since 
S{m) < m and S{m) is odd for all m g N, we have S{A) C B. Moreover, A and 
B are of the same size. We present an inverse function to finish the proof. Let 
X £ B. Note that there is a unique number y E N such that a;2^ G A, since A is 
a range between two consecutive powers of 2 and x < k. Setting S^^{x) = x2y 
we have found the inverse. □ 

5.2 Discrepancy Analysis 

We now bound the largest discrepancy encountered during one period, i.e., 
Perf(BiNARY) = max ff(BiNARY, + fc) = (fc + 1) max It .^/ti+k- 

l<i<fe/2 l<i<fe/2 

We first compute the maximum and later multiply with the factor fc + 1. By 
Lemma [2] we only have to consider intervals newly created by insertion and 
deletion at any step. 

Intervals from Insertion: We first compute the discrepancy of the interval 

newly added at time ii+fe, I < i < k/2. Its length is ti+fc — ti+fc-i, so its 



10 



discrepancy (without the factor fc + 1) is 



^i+k ti^^ 

_ ^ ^i+fc/2-1 
ti+k/2 

il-a-2A, 

where the second equality holds because of ([2]) if i > 1 or ([l]) if i = 1. 
Using > 1 + X for a; S M yields a bound on the discrepancy of 

7 < Ha)-r = ln(a ). 

ti+k K 



Deleting ti: We show similar bounds for the intervals we get from deleting 
an old checkpoint. We first analyze the deletion of ti — this case is different from 
the general one, since ti has no predecessor. Note that ti is deleted at time t^i^/2- 
The deletion of ti creates the interval [0, i2]. This interval has discrepancy 

^0J3^ia-'s^<l/fc, 

^3fc/2 aife 

since we choose a > 2. Hence, this discrepancy is dominated by the one we get 
from newly inserted intervals. 



Other Intervals from Deletion: It remains to analyze the discrepancy of 
the intervals we get from deletion in the general case, i.e., at some time ti+k, 
\ < i < k/2. At this time we delete checkpoint d{i + fc), so we create the interval 

[td(t+k)~i,td(t+k)+i] of discrepancy 

td(i+k) + l ^ Hyi) ts(i+k/2) + l — ts{i+k/2)-l 

Qi 7 ~ 7 ■ 

ti+k Olli+k/2 

Let h :— e{i + fc/2), so that 2^ is the largest power of 2 dividing i + fc/2, and 
2^ S{i + fc/2) = i + fc/2. Then ts{i+k/2)+i = a"''ii+fe/2+2'' by ([T|, and a similar 
statement holds for ts(i+fc/2)-i- yielding 

^i+fe/2+2'> ^ ^i+/c/2-2'i 

Qi = a . 

ti+k/2 

Using ([3]) we get ti+k/2 — o?^^^^^ ■ Comparing this with the respective terms for 

ti+k /2+2t- and ij+fe/2-2'- yields 

= oT^-^ • 2sinh (In {o?) 2^ jk) . 
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By elementary means one can show that the function f{x) = x~"^ smh{Bx), 
A > 1, B > 0, is convex on K>o. Since convex functions have their maxima at 
the boundaries of their domain, and since by above equation qi can be expressed 
using /(2'') (for A = Iga and B = \n{a^)/k), we see that qi is maximal at (one 
of) the boundaries of h. Recall that we treated i = k/2 separately, and observe 
that the largest power of 2 dividing i + k/2, l<i<fc/2isat most k/A. Hence, 
we have < 2'* < fc/4 and 

qi < max {2a-^ smh{\ii{a^)/k),2a-'^ {k/A)-^i^" smh{\n{a)/2)} . 

We simplify using a > 2 and sinh(x) = x + 0{x^) to get 

q, < max{ln(a2)/fc + CI(l/fc2), (fc/4)-'s" sinh(ln(a)/2)} . (5) 

The first term is already of the desired form. For the second one, note that 
setting a = 2 we would get a discrepancy of 4sinh(ln(2)/2)/fc — \/2/k. We get 
a better bound by choosing 

a := 2^+wW, 

with c :— lg(\/2/ ln(4)) w 0.029. Then the second bound on from above 
becomes 

(fc/4)--sinh(ln(a)/2) = l2-^sinh (^(l + ^ 

The particular choice of c allows to bound the derivative of sinh((l + x) ln(2)/2) 
for X e [0, c] from above by 

cosh((l + c) ln(2)/2) < 0.39. 

Hence, we can upper bound 

0.39c 



sinh 



lg(fc/4)- 

Thus, in total the second bound on qt from inequality ([s]) becomes 

4 4 • 2^*= • n SQr 

(fc/4)-'s"sinh(ln(a)/2) < -2^ sinh(ln(2)/2) + ^^^^^^^^ . 

Since c = lg(%/2/ ln(4)) = lg(4sinh(ln(2)/2)/ln(4)), this becomes 
< ln(4)/fc + 0.044/(fclg(fc/4)). 

Overall discrepancy: In total, we can bound the discrepancy q := Perf (Binary) 
of our algorithm (now including the factor of fc + 1) by 

<? < (fc + 1) max {ln(a^)/fc + 0(l/fc2),ln(4)/fc + 0.044/(fc lg(/c/4))} . 



12 



Using (fc + l)/fc = 1 + 0{l/k) and 

this bound can be simplified to 

q < max{ln(4) + 0.040/ lg(fc/4) + 0(l/fc),ln(4) +0.044/ lg(fc/4) + 0(l/fc)}, 
which proves Theorem [3j 

6 Upper Bounds via Combinatorial Optimization 

In this section we show how to find upper bounds on the optimal discrepancy q* (fc) 
for fixed k. We do so by constructing cyclic algorithms using exhaustive enumer- 
ation of all short patterns in the case of very small k or randomized local search 
on the patterns for larger k, combined with linear programming to optimize the 
checkpoint positions. This yields good algorithms as summarized in Table [T] In 
the following we describe our algorithmic approach. 

Finding Checkpoint Positions: First we describe how to find a nearly 
optimal cyclic algorithm given a pattern P and a scaling factor 7. i.e., how to 
optimize the checkpoint positions. To do so, we construct a linear program that 
is feasible if a cyclic algorithm with discrepancy A and scaling factor 7 exists. 
We use three kinds of constraints: We fix the ordering of the checkpoints, enforce 
that the i-th active checkpoint after one period is a factor 7 larger than the i-th 
initial checkpoint, and upper bound the discrepancy of each interval during the 
period by A. We then use binary search to optimize A. 

Lemma 4. For a fixed pattern P of length n and scaling factor 7, let q* = 
inf^Perf(v4) be the optimal discrepancy among algorithms A using P and 7. 
Then finding an algorithm with discrepancy at most q* + e reduces to solving 
0(loge^^) linear feasibility problems with 0(nk) inequalities and k + n variables. 

Proof. For a fixed pattern and scaling factor, we can tune the discrepancy of the 
algorithm by cleverly choosing the time points when to remove an old checkpoint 
and place a new one. By solving a linear feasibility problem we can check whether 
a cyclic algorithm with scaling factor 7 and pattern P exists that guarantees a 
discrepancy of at most A. We can then optimize over A to find an approximately 
optimal algorithm. 

We construct a linear program with the k + n time points (ti, . . . , tfc+„) as 
variables (where we can set tk = 1 without loss of generality) . It uses three kinds 
of constraints. The first kind is of the form 

ti ^ ti+li 

for alH e [1, fc + n). These constraints are satisfied if the checkpoint positions 
have the correct ordering, i.e. checkpoints with larger index are placed at later 
times. 
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The second kind of constraints enforces the scaHng factor. Since the pattern 
is fixed, we can compute at all steps which checkpoints are active. For is [1, A;] 
and j G [0, n], let rf be the variable of the i-th active checkpoint in step j and 
let Tq be for all j. It is easy to see that the algorithm has a scaling factor of 7 
if the i-th active checkpoint in the last step is larger by a factor of 7 than in the 
first step. We encode this as constraints of the form 

n 
n = ITi . 

Lastly we encode an upper bound of A for the discrepancy. Since the discrepancy 
of a cyclic algorithm is given by 

max {k + l)£tjti, 

k<i<k-\-n 

and each £t. can be expressed by a maximum over k terms, we can encode a 
discrepancy guarantee of A with nk constraints of the form 

rU,-ri<XTi/{k + l), 

for all i e [0, k) and j e [0, n]. 

A feasible solution of those constraints fixes the checkpoint positions and 
hence, together with the pattern P, provides an algorithm with discrepancy at 
most A. Using a simple binary search over A € [1, 2] we can find an approximately 
optimal algorithm for this value of 7 and the pattern P. □ 



Finding Scaling Factors: Next we show how to find scaling factors 7 for 
which algorithms with good discrepancy exist. We first show an upper bound 
for 7. 

Lemma 5. A cyclic algorithm with k checkpoints, discrepancy X < k, and a 
period length of n can have scaling factor at most 



7< 



1 



1- A/(fc + l) 



Proof. Consider any checkpointing algorithm A = {t, d) with k checkpoints and 

discrepancy A. At any time ti, i > k, the largest interval has length it- > ti — ti-i, 
as there is no checkpoint in the time interval Hence, we have 



Rearranging, this yields 



1 

- l-A/(fc + l)^* 



Iterating this n times, we get 



(,l-A/(fc + l),' 
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k I 3 4 5 6 7 8 9 10 15 20 30 50 100 
Discr. 11.529 1.541 1.472 1.498 1.499 1.499 1.488 1.492 1.466 1.457 1.466 1.481 1.484 

Table 1: Upper bounds for different k. For fc < 8 all patterns up to length k were 
tried. For fc = 8 all patterns up to length 7 were tried. For larger fc, patterns 
were found via randomized local search. 

Hence, for any cyclic algorithm (with discrepancy A, fc checkpoints, and a period 
length of n) we get the desired bound on the scaling factor 7 — tk+n/tk- D 

Since algorithms with discrepancy 2 are known [Ij, we can restrict our 
attention to A < 2. Hence, for any given pattern length n, Lemma |5] yields an 
upper bound on 7, while a trivial lower bound is given by 7 > 1. Now, for any 
given pattern P we optimize over 7 using a linear search with a small step size 
over the possible values for 7. For each tested 7, we optimize over the checkpoint 
positions using the linear programming approach described above. 

Finding Patterns: For small fc and n, we can exhaustively enumerate all fc" 
removal patterns of period length n. Some patterns can be discarded as they 
obviously cannot lead to a good algorithm or are equivalent to some other pattern: 
No pattern that never removes the first checkpoint can be cyclic. Furthermore, 
patterns are equivalent under cyclic shifts, so we can assume without loss of 
generality that all patterns end with removing the first checkpoint. Lastly, 
it never makes sense to remove the currently last checkpoint. Hence, for fc 
checkpoints there are at most (fc — 1)"^^ interesting patterns of length n. This 
finishes the description of our combinatorial optimization approach. 

Results: We ran experiments that try patterns up to length fc for fc e [3, 7]. 
For fc = 8 we stopped the search after examining patterns of length 7. For larger 
fc we used a randomized local search to find good patterns. The upper bounds 
we found are summarized in Table [l| and for fc < 8 the removal patterns and 
time points when to place new checkpoints can be found in Fig. |3) Note that for 
fc = 3 this procedure re-discovers the golden ratio algorithm of Sect. [3] 

Note that we can combine the results presented in Table [T] with the algorithm 
Linear (Theorem [2] and Fig. |4| to read off a global upper bound of q*{k) < 1.7 
for the optimal discrepancy for any fc. 

For a fixed pattern the method is efficient enough to find good checkpoint 
positions for much larger fc. For fc < 1000 we experimentally compared the 
algorithm Linear of Sect. |4] with algorithms found for its pattern (1, . . . , fc — 1). 
The experiments show that for fc = 1000 Linear is within 4.5% of the optimized 
bounds. For the algorithm Binary of Sect. |5] this comparison is even more 
favorable. For fc = 1024 the algorithm places its checkpoints so well that the 
optimization procedure improves discrepancy only by 1.9%. The results are 
summarized in Fig. [4] and Fig.[5j 
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Pattern = 1 



Pattern =3.1 



Pattern =2,3.1 




k = % 

Pattern =2,3,5,1,3,1 



A = 7 

Pattern =3,4,1,5,3,1 



/t=8 

Pattern =4.7.2.3.4.1 




Figure 3: Time points where the i-th checkpoint is placed to achieve the bounds 
of Table [1] Time is on the j/-Axis, iteration is on the x-Axis. 
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Figure 4: The discrepancy of algorithm Linear from Sect. |4]for different values 
of k compared with the upper bounds for its pattern found via the combinatorial 
method from Sect. [6] For large k Linear is about 4.5% worse. 
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Figure 5: The discrepancy of the algorithm from Sect. [5]for some values of k, 
compared with the upper bounds for its pattern found via the combinatorial 
method from Sect. [6] For k — 1024, the optimization procedure finds a checkpoint 
placement with only 1.9% better discrepancy. 



Do we find optimal algorithms? One could ask whether the algorithms 
from Tableware optimal, or at least near optimal. There are two steps in above 
optimization algorithm that prevent this question to be answered positively. First, 
we are only optimizing over short patterns, and it might be that much larger 
pattern lengths are necessary for optimal checkpointing algorithms. Second, 
we do not know how smoothly the optimal discrepancy for fixed pattern P 
and scaling factor 7 behaves with varying 7, i.e., we do not know whether our 
linear search for 7 yields any approximation on the discrepancy A. However, in 
experiments we tried all patterns of length 2k for k e [3, 4, 5] and found no better 
algorithm than for the shorter patterns of length up to k. Moreover, smaller step 
sizes in the linear search for 7 lead only to small improvements, indicating that 
the discrepancy is continuous in 7. This suggests that the reported algorithms 
might be near optimal. 

7 Existence of Optimal Algorithms 

In this section, we prove that optimal algorithms for the checkpointing problem 
exist, i.e., that there is an algorithm having discrepancy equal to the infimum 
discrepancy q*{k) := inf^Perf(A) among all algorithms for k checkpoints. 

Theorem 4. For each k there exists a checkpointing algorithm A for k check- 
points with Perf(yl) — q*{k), i.e., there is an optimal checkpointing algorithm. 

As we will see throughout this section, this a non-trivial statement. From 
the proof of this statement, we gain additional insight in the behavior of good 
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algorithms. In particular, we show that we can assume without increasing 
discrepancy that for all i the i-th checkpoint is set by a factor of at least 
(1 + later than the first checkpoint. 

An initial set of checkpoints can be described by a vector x = (.xi, . . . , x^), 
< Xi < . . . < Xk- Since x = (0, . . . , 0) can never be extended to a checkpointing 
algorithm of finite discrepancy, we shall always assume x ^0. Denote by X the 
set of all initial sets of checkpoints (described by vectors a; ^ as above), and 
by Xq the set of all a; G X with Xk = 1. 

We say that A = {t, d) is an algorithm for an initial set x € X of checkpoints 
if ti = Xi for all i E [k]. We denote by q{x) := inf^ Pcrf(A), where A runs over 
all algorithms for x, the discrepancy of x. An initial set x E X is called optimal 
if q{x) = infa,ex q{x) = q*{k). 

Lemma 6. Optimal initial sets of checkpoints exist. 

Proof. Since the discrepancy of an initial set of checkpoints is invariant under 
scaling, that is, q{x) = q{Xx) for all x G X and A > 0, we have mf^^x q{x) = 

It is not hard to see that q{-) is continuous on Xq: Let x,x' G Xq with 
|a; — x'loo < e and consider an algorithm A = {t,d) for x. We construct an 
algorithm A' = {t',d) for x' by setting t'^ = ti for i > k. Then |Perf(A) - 
Perf(A')| < 2£, since any interval's length is changed by at most 2e. This implies 
\q(x) — q{x')\ < 2e and, thus, shows continuity of q{-). 

Now, since q{-) is continuous on Xq and Xq is compact, there exists an 
X G Xq such that q{x) = inf^^Xo ^i^) ~ <f{k). □ 

An easy observation is that if some checkpointing algorithm leads to a vector 
X of checkpoints at some time, then we may continue from there using any other 
algorithm for x. The discrepancy of this combined algorithm is at most the 

maximum of the two discrepancies. 

Lemma 7. Let A = [t, d) he a checkpointing algorithm. Let i > k. We 
call qA,i = maxjg[fe..j] . (fc + l)/tj the partial discrepancy of A observed in 
the time up to ti. Assume that when running A, at time ti the checkpoints 
x = {xi, . . . ,Xk = ti) are active. Let A' = {t' , d') be an algorithm for x. Then 
the checkpointing algorithm obtained from running A until time ti and then 
continuing with algorithm A' is a checkpointing algorithm that has discrepancy 
at most inax{qA,i,Perf{A')}. If we run this combined algorithm only until some 
time t'j, then the partial discrepancy observed till then is max{qA,i, Qa'j}- 

Proof Trivial. □ 

The above lemma implies that in the following, we may instead of looking at 
an arbitrary time simply assume that the algorithm just started, that is, that 
the current set of checkpoints is the initial one. 

The following lemma shows that we can, without loss of discrepancy, assume 
that an algorithm for the checkpointing problem does not set checkpoints too 
close together. While also of independent interest, among others because it 
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shows how to keep additional costs for setting and removing checkpoints low, 
we shall need this statement in our proof that optimal checkpointing algorithms 
exist. 

Lemma 8. Let A — (t, d) be an algorithm for the checkpointing problem with 
Perf(A) < k + I. Then there is an algorithm A' = (<', d') with the same starting 
position such that (i) Perf(A') < Perf(A) and 

(")^'^^3^^^(^+ fc + f-'^f(^) )^^'^(^n)- 

Proof. Let r = Perf(yl)/(fc + 1 — Perf(A)) for convenience. By way of contra- 
diction, assume that the lemma is false. Let ^ be a counter-example such that 
i :— min{i G N | t^+i > 1 + r} is minimal (the minimum is well-defined, since 
for any algorithm the sequence {ti)i tends to infinity). Note that z > 4, since A 
is a counter-example. 

Assume that there is a j G — 1] such that tk+j in the further run of A 
is removed (and replaced by the then current time t^) earlier than both tk+j-i 
and tk+j+i- Consider the Algorithm A' that arises from A by the following 
modifications. Let ty be the checkpoint that was removed to install the checkpoint 
tj. Let A' be the checkpointing algorithm that proceeds as A except that ty is not 
replaced by t^+j, but by t^, and t^+j is never created. The only interval which 
could cause this algorithm to have a worse discrepancy than A is [tk+j-i, tk+j+i]. 
However, this interval contributes [k + l){tk+j+i—tk+j~i)/tk+j+i < (fc-|-l)r/(l-|- 
r) < Perf(yl) to the discrepancy of A'. Hence, Perf(A') < Perf(^) and A' has 
fewer checkpoints in the interval [1, 1 + r] contradicting the minimality of A. 
Thus, there is no j G [l..i — 1] such that tk+j is removed earlier than both 
tk+j-iand tk+j+i (*). 

We consider now separately the two cases that tk+i is removed earlier than 
tk+i-2 a-nd vice versa. Note first that fc + l<A;-|-i — 2by assumption that i > 4. 

Assume first that tk+i is removed (at some time t^) earlier than tt+i-2- Then 
tk must have been removed even earlier (at some time ty), otherwise we found a 
contradiction to (*). Let A' be an algorithm working identically as A, except 
that at time ty the checkpoint tk+i is removed (instead of t^) and at time tx 
the checkpoint t^ is removed (instead of tk+i). Since the checkpoint at tt+i-2 is 
still present, the only interval affected by this exchange, namely the one with 
tk as left endpoint, has length at most r. Hence as above, this contributes at 
most Perf(A) to the discrepancy of A' . The algorithm A' has the property that 
there is a checkpoint in between tk and tk+i-2 which is removed before these 
two points. The earliest such checkpoint, call it tk+j, has the property that tk+j 
is removed earlier than both tk+j-i and tk+j+i, contradicting earlier arguments. 

A symmetric argument shows that also tk+i~2 being removed before tk+i 
leads to a contradiction. Consequently, our initial assumption that i > 4 cannot 
hold, proving the claim. □ 

The following is a global variant of Lemma [8] It shows that any reasonable 
checkpointing algorithm does not store new checkpoints too often. 
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Theorem 5. Let A = (t, d) be a checkpointing algorithm with Perf(A) < fc — 1. 
Then there is an algorithm A' = [t' , dl) with the same starting position such that 
(i) Perf(A') < Perf(A) and (ii) t',^^ > (1 + l/k) ■ t'^ for all i > k. 

Proof. Let j > k he the smallest index with a small jump, < (1 + l/k)tj. 
Using Lemma [S] (on the remainder of algorithm A starting at time tj) we can 
remove this small jump and get an algorithm A' = {t' , d') with Perf (A') < Perf (A) 
and t^_|_3 > (l + l//c) -t'^ for all fc < z < j, i.e., we patched the earliest small jump. 
Iterating this patching procedure infinitely often yields the desired algorithm. 

□ 

Lemma 9. For any optimal initial set x = (xi, . . . ,Xk), there is an algorithm 

A = (t, d) such that (i) qA,k+3 = T^Si^je[k..k+3] ^t, [k + l)/tj < q*{k), (ii) tk+3 > 
+ 1/fc), and the set of checkpoints active at time t/c+3 is again optimal. 

Proof. By the definition of optimality, for each n S N there is an algorithm A^") 
for X that has discrepancy at most q*{k) + 1/n. Let 4+2' 4+3) denote 

the corresponding next three checkpoints. By Lemma [Sj we may assume that 

4+^3 > + 1/fc) for all neN. 

Note that (using the same arguments as in Lemma [s]) any algorithm hav- 
ing discrepancy at most 2.5 satisfies tk+i < Q^tk for any k > 2. Hence, 
(4+1' 4'+2' 4'+3)"eN>2 is a sequence in the compact space [tk,6'^tk]'^ . This se- 
quence has a convergent subsequence with limit (i^+i, ife+2, ife+s). Also, since 

there are only finitely many values possible for (d[.'!(_\, d^']|^2' '^i'+3)' this subse- 
quence can be chosen such that this d-tuple is constant, say (d^+i, dfc+2, dfe+3). 
For this subsequence, also all k + 1 intervals existing at the three times of interest 
converge. Consequently, the discrepancy caused by each of them also converges 
to a value upper bounded by q*{k). This defines the three steps of algorithm A, 
satisfying qA,k+3 < q*{k). 

Similarly, we observe that the set of checkpoints a;^") active at time t^.'^'g 
when running algorithm A*^"-' has discrepancy at most q*{k) + l/n. Consequently, 
the active checkpoints we get from the limit checkpoints {tk+i,tk+2,tk+3) and 
deletions (dfe+i, rffe+2, rffc+3) are again optimal. 

Finally, since all t^^^'g > tk{l + this also holds for tk+3. D 

We are now in position to prove the main result of this section, Theorem|4] For 
this, we repeatedly apply Lemma|9] We start with an optimal set of checkpoints x. 
Then we run the algorithm delivered by Lemma [9] for three steps. This creates 
no partial discrepancy larger than q* (k) and we end up with another optimal 
set of checkpoints. From this, we continue to apply Lemma [9] and execute three 
steps of the algorithm obtained. By Lemma |7j the partial discrepancy of the 
combined algorithm is again at most q*{k). Iterating infinitely, this yields an 
optimal algorithm, which proves Theorem |4] 
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8 Lower Bound 



In this section, we prove a lower bound on the discrepancy of all checkpointing 
algorithms. For large k we get a lower bound of roughly 1.3, so we have a lower 
bound that is asymptotically larger than the trivial bound of 1. Moreover, it 
shows that algorithm Binary from Sect. [5] is nearly optimal, as for large k the 
presented lower bound is within 6% of the discrepancy of Binary. 

Theorem 6. All checkpointing algorithms with k checkpoints have a discrepancy 
of at least 

2 - In 2 - 0{k-^) > 1.306 - 0{k~^). 

The remainder of this section is devoted to the proof of the above theorem. 
Let A = {t,d) be an arbitrary checkpointing algorithm and let q' := Perf(A) be 
its discrepancy. For convenience, we define q = kq' /{k + 1) and bound q. Since 
q < q' this suffices to show a lower bound for the discrepancy of A. For technical 
reasons we add a gratis checkpoint at time tk that must not be removed by A. 
That is, even after the removal of the original checkpoint at tk, there still is the 
gratis checkpoint active at t^- Clearly, this can only improve the discrepancy. 
We analyze the discrepancy of A from time tk until it deleted k/{2q) of the initial 
checkpoint^ More formally, we let t' be the minimal time at which the number 
of active checkpoints of A contained in [0, tk] is k — k/ {2q). Note that we might 
have t' = oo, a the checkpointing algorithm A never deletes k/{2q) points from 
[0,tfe]. However, in this case its discrepancy is lower bounded by 1.5. 

Lemma 10. Iff oo, then Perf(yl) > 1.5. 

Proof. Consider a large i > k and the algorithm's discrepancy at time ti. By 
assumption, there are at most k — k/{2q) active checkpoints in {tk,ti\. Hence, 
by comparing with an equidistant spread we can bound the discrepancy (at time 
ti) by 

Perf(A) > ^ . , > ^fl - ^ 

^ ' - t, fc(l - l/(2g)) - 2q-l\ t. 

Letting i — ;> oo, so that ti — > oo, we obtain 
Perf(A)>^> 



2q-l - 2Pcrf(A) - 1' 

(by definition of q and x 2x-i being monotonically decreasing). This inequality 
solves to the desired Perf(A) > 1.5. □ 

Hence, in the following we can assume that t' < oo. We partition the intervals 
that exist at time t' into three types: 



^To be precise we should round ^ to one of its nearest integers. When doing so, all 
calculations in the remainder of this section go through as they are; this only slightly increases 
the hidden constant in the error term 0{k~^). 
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1. Intervals existing both at time tk and t'. These intervals are contained in 

2. Intervals that are contained in [0, t^], but did not exist at time tk- These 
intervals were created by the removal of some checkpoint in [0, tk] after 
time tk- 

3. Intervals contained in [tk,t']- 

Note that we need the gratis checkpoint at tk in order for these definitions to 
make sense, as otherwise there could be an interval overlapping tk- 

Let £i denote the set of intervals of type i ioi i G {1, 2, 3}, and set ki := |£i|. 
Let £2 = {-^1, • • • , Jfea}) where the intervals are ordered by their creation times 
Ti < - - - < Tk2- Since each interval in £2 contains at least one deleted point we 
have 

k 

- 2q' 

and we set m := ^ — ^2- Then m counts the number of deleted checkpoints in 
[Ojtfe] that did not create an interval in £2, but some strict sub-interval of an 
interval in £2 . We call these m removed checkpoints free. 
We first bound the length of the intervals in £1 and £2. 

Lemma 11. The length of any interval in £1 is at most qtk/k. 

Proof. As all intervals in £1 already arc present at time tk and the algorithm 
has discrepancy q', we have for any / e £1 

{k + l)\I\/tk<q' = {k + l)q/k. 

The bound follows. □ 

Lemma 12. The length of any interval li e £2 is at most 

Proof. As the algorithm has discrepancy g', we know 

\Ii\<qTi/k- (6) 

In the following we bound Tj, the time of creation of /j. At time Tj there are at 
most m + i intervals in £3, since at most m free checkpoints and i checkpoints 
from the creation of Ji, . . . , are available. Comparing with an equidistant 
spread of m + i checkpoints in [tk , Ti] and the algorithm's discrepancy, the longest 
interval L in [ifc,Ti] (at time Ti) has length 



m + i k 
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Rearranging the outer inequality yields a bound on of 

^ ktk 
^ ~ k ~ [ra + i)q 

Substituting this into ^ yields the desired result. □ 

Furthermore, we need a relation between fci, k,m, and q. 
Lemma 13. We have 

ki = k + m — k/q + 1. 

Proof. As the intervals in Ci and £2 partition [0,tfe], there are fci + ^2 intervals 
left in [0, tk] at time t' . Note that each but one such interval has its left endpoint 
among the k active checkpoints from time tk (the one exception having as left 
endpoint 0). Hence, there are fci + ^2 — 1 checkpoints left in [0,tfe]. Comparing 
with the number A;2 + of deleted checkpoints in [0, t^] until time t' and their 
overall number k yields 

(fc2 + rn) + (fci + fc2 - 1) = fc. 

Rearranging this and plugging in fc2 = ^ — m (which holds by definition of m) 
yields the desired result. □ 

Now we use our bounds on the length of intervals from Ci and C2 to find a 
bound on q. Note that the intervals in Ci and C2 partition [0,ffe], so that 



leCi i'eC2 
Using Lemmas [TT] and [12] we obtain 

i J, < fci — + E 

~ k k/q ^ m ~ i 

Substituting ki using Lemma |13| yields 

k/{2q)-m ^ 
tk<[k + m-k/q + l]qtk/k+ V — ^ : 

/ fe/(2g)-m \ 

^tk[q-l + ml+Oik-')+ r/ ]■ 

I k '^-^ k/q — m — I j 

Recall that = J2i<i<n^~^ n-th harmonic number. Rearranging ([t]) 

yields 

q>2- m| - 0{k^^) - Hk/q-m-l + Hk/{2q)-l- 
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Observe that we have m| + i^fe/q-m-i < implying 
q>2 + i/fe/(2g)-i - - O(fc-i) 

since we can hide the last summands of -fffc/(2q) and -ff^/q by 0(A;^^). In 
combination with the asymptotic behavior of _ff„ = Inn + 7 + 0(ri^^), where 7 
is the Euler-Mascheroni constant, we obtain 

q>2 + ln(fc/(2g)) - ln(fc/g) - 0(fc-^) 
= 2-ln(2)-0(A:-i). 

This finishes the proof of Theorem [6] 
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